A Data Stream Publish/Subscribe Architecture with Self-adapting Queries

نویسندگان

  • Alasdair J. G. Gray
  • Werner Nutt
چکیده

In data stream applications, streams typically arise from a geographically distributed collection of producers and may be queried by consumers, which may be distributed as well. In such a setting, a query can be seen as a subscription asking to be informed of all tuples that satisfy a specific condition. We propose to support the publishing and querying of distributed data streams by a publish/subscribe architecture. To enable such a system to scale to a large number of producers and consumers requires the introduction of republishers which collect together data streams and make the merged stream available. If republishers consume from other republishers, a hierarchy of republishers results. We present a formalism that allows distributed data streams, published by independent stream producers, to be integrated as views on a mediated schema. We use the formalism to develop methods to adapt query plans to changes in the set of available data streams and allow consumers to dynamically change which streams they subscribe to.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leveraging Distributed Publish/Subscribe Systems for Scalable Stream Query Processing

Existing distributed publish/subscribe systems (DPSS) offer loosely coupled and easy to deploy content-based stream delivery services to a large number of users. However, the lack of query expressiveness limits their application scope. On the other hand, distributed stream processing engines (DSPE) provide efficient processing services for complex stream queries. Nevertheless, these systems are...

متن کامل

Scalable Delivery of Stream Query Results

Continuous queries over data streams typically produce large volumes of result streams. To scale up the system, one should carefully study the problem of delivering the result streams to the end users, which, unfortunately, is often overlooked in existing systems. In this paper, we leverage Distributed Publish/Subscribe System (DPSS), a scalable data dissemination infrastructure, for efficient ...

متن کامل

Top-k/w publish/subscribe: A publish/subscribe model for continuous top-k processing over data streams

Continuous processing of top-k queries over data streams is a promising technique for alleviating the information overload problem as it distinguishes relevant from irrelevant data stream objects with respect to a given scoring function over time. Thus it enables filtering of irrelevant data objects and delivery of top-k objects relevant to user interests in real-time. We propose a solution for...

متن کامل

Building a Replicated Logging System with Apache Kafka

Apache Kafka is a scalable publish-subscribe messaging system with its core architecture as a distributed commit log. It was originally built at LinkedIn as its centralized event pipelining platform for online data integration tasks. Over the past years developing and operating Kafka, we extend its log-structured architecture as a replicated logging backbone for much wider application scopes in...

متن کامل

SOPS: A System for Efficient Processing of Spatial-Keyword Publish/Subscribe

Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date geo-textual objects (e.g., geo-tagged Tweets) such that their locations meet users’ need and their texts are interesting to users. For example, a user may want to be update...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005